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Abstract 

RNA-RNA binding is an important phenomenon observed for many classes of non-coding 
RNAs and plays a crucial role in a number of regulatory processes. Recently several MFE 
folding algorithms for predicting the joint structure of two interacting RNA molecules have 
been proposed. Here joint structure means that in a diagram representation the intramolecular 
bonds of each partner are pseudoknot-free, that the intermolecular binding pairs are noncrossing, 
and that there is no so-called "zig-zag" configuration. This paper presents the combinatorics of 
RNA interaction structures including their generating function, singularity analysis as well as 
explicit recurrence relations. In particular, our results imply simple asymptotic formulas for the 
number of joint structures. 
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Singularity analysis. 



1. Introduction 



RNA-RNA binding is an important phenomenon observed in various classes of non- 
coding RNAs and plays a crucial role in a number of reg ulatory processes. Exam ples 
include the regulation of translation in both: prokaryot es ( iNarberhaus et all 120071 ) and 



eukarvotes flMcManus et al 



ifications ( Bachellerie et al 



control (IKugel and Goodrich 



2002 



Banerjee et al. . 2002[). the target ing of chemical mod- 



20021) ■ insertion editing (IBenne 



19921 ). and transcriptional 



20071 ). More and more evidence suggests, that RNA-RNA 



interactions also play a role for the functionality of long mRNA-like ncRNAs. A common 
theme in many RNA classes, including miRNAs, snRNAs, gRNAs, snoRNAs, and in par- 
ticular many of the procaryotic small RNAs, is the formation of RNA-RNA interaction 
structures that are much more complex than simple complementary sense- ant isense inter- 
actions. The interaction between two RNAs is governed by the same physical principles 
that determine RNA folding: the formation of specific base pairs patterns whose energy is 
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Fig. 1. RNA-RNA interactions structures and their prediction. The 
primary interaction region(s) are highhghted in red in the experimen- 
tally supported stru ctural models from the literature: ompA -MicA: 



OxyS: (IChitsaz et al 



flUdekwu et al\. 12005 ): sodB-RvhB: flGeissmann and Touatil . l2004h : fhlA- 



2009I ). Hybridization probabilities computed by rip 



are annotated by green boxes for regions with a probability larger than 10%. 



largely determined by base pair stacking and loop strains. Therefore, secondary structures 
are an appropriate level of description to quantitatively understand the thermodynamics 
of RNA-RNA binding. 



Pervouchine . 


2004 


) and ( 


Alkan et al.. 


2006) 



proposed MFE folding algorithms for predicting the joint structure of two interacting RNA 
molecules. In this model, "joint structure" means that the intramolecular structures of 
each partner are pseudoknot-free, that the intermolecular binding pairs are noncrossing, 
and that there is no so-called "zig-zag" configuration, see Section [3] for details. This struc- 
ture class seems to include all major interaction complexes. The optimal joint structure 
can be computed in 0{N^) tirne and O(N^) space by rneans of dynamic pro gramming 



(Alkan et al. 



2006 



Pervouchine . 



2004; 



Huang et al. 



2010; 



Chitsaz et al 



200%. More re 



cently, extens ions involving the p artition function were proposed by (IChitsaz et al. 
(piRNA) and (|Huang et all \200% (rip), see Fig. [H 



20091) 



In contrast to th e situation for RNA secondary structures (IWaterman et al. 



1978 



Schmitt et al 



19941) . little is known about the joint structures that are the folding targets 



of rip (IHuang et a/.l . l2010l ). This paper closes this gap and introduces the combinatorics 
of interaction structures. We present the generating function of joint structures, its sin- 
gularity analysis as well as explicit recurrence relations. In particular, our results imply 
simple formulas for the asymptotic number of joint structures. 
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The paper is organized as follows: in Section[2]we provide sever al basic fact and con text. 



In Section [3] we introduce joint structures along the lines of 



Huang et oil 120091 ). In 



Section H] we follow the ideas of the paper flReidys et all l2010l ) and consider shapes of 
joint structures. In Section [5] we use shapes in order to compute the generating function 
of joint structures and Section [6] deals with the singularity analysis. We then integrate 
our results in Section [71 Finally we present additional results in Section [HI 



2. Some basic facts 

2.1. Singularity analysis. Let f{z) = Yln>o^n be a generating function with non- 
negative coefficients and a radius of convergence R > 0. In light of the fact that explicit 
formulas for the coefficients a„ can be very complicated or even impossible to obtain, 
we switch over to investigate the estimation of a„ in terms of the exponential factor 7 
and the subexponential factor P{n), that is, a„ ~ P{n) 7". The derivation of exponential 
growth rate and subexponential factor is mainly based on singularity analysis. Singularity 
analysis is a framework that allows to analyze the asymptotics of these coefficients. The 
key to obtain the asymptotic information about the coefficients of a generating function 
is its dominant singularities, which raises the question on how to locate them. In the 
particular case of power series w ith nonnegative coefficients and a radius of convergence 



i? > 0, a theorem of Pringsheim (iFlajoletl . 120071 : iTitchmarshl . Il939l ) . guarantees a positive 
real dominant singularity ai z = R. As we are dealing here with combinatorial generating 
functions we always have this dominant singularity. Furthermore for all our generating 
functions it is the unique dominant singularity. The class of theorems that deal with 
the deduction of i nformation abo ut coefficients from the generating function are called 
transfer-theorems (IFlajoletl . l2007l ). 

To be precise, we say a function f{z) is Ap analytic at its dominant singularity z = p, 
if it analytic in some domain Ap(0, r) = {z \ \z\ < r, z p, |Arg(2 — p)\ > 0}, for some 
0, r, where r > \p\ and < < |. We use the notation 

{f{z) = {g{z)) as z ^ p) ^ {f{z)/g{z) c as z ^ p) , 

where c is some constant. Let [z^]f{z) denote the coefficient of 2" in the power series 
expansion of f{z) at 2; = 0. Since the Taylor coefficients have the property 

V7GC\0; [z-]f{z) 



We can, without loss of generality, reduce our analysis to the case where ^ = 1 is the 
unique dominant singularity. The next theorem transfers the asymptotic expansion of 
a function around its unique dominant singularity to the asymptotic of the function's 
coefficients. 
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Theorem 1. jiFlajolei . {2002) Let f{z) be a Ai analytic function at its unique dominant 
singularity z = 1. Let 

g{z) = (1 - zr \og^ (i^) ' «' e M. 
That is we have in the intersection of a neighborhood of 1 

(2.1) f(z) = Q{g{z)) forz^l. 
Then we have 

(2.2) [z-]f{z) = e mg{z)) . 

(1-^)-°, a G C\Z<o, then 



Theorem 2. ^Flajolei \2002) Suppose f{z 
(2.3) 



Via) 



^ ^ a{a ~ 1) _|_ <^{(^ ~ 1)('^ — 2)(3a — 1) 



2r2 



24^2 



a\a - l)\a - 2){a - ?>) 
48^^3 



2.2. Symbolic Enumeration. Symbolic enumeration (jFlajoletl . 120071 ) plays an impor- 
tant role in the following computations. We first introduce the notion of a combinatorial 
class. Let z = (^i, . . . , 2^) be a vector of d formal variables and k = (/ci, . . . , fc^) be a 
vector of integers of the same dimension. We use the simplified notation 



Definition 1. A combinatorial class of d dimension, or simply a class, is an ordered pair 
{A, wa) where ^ is a finite or denun 
that wj\^{\\) is finite for any n G Z>q. 



{A, Wa) where ^ is a finite or denumerable set and a size-function wa '■ -A — > Z>q satisfies 



Given a class {A,wa), the size of an element a G ^1 is denoted by wyi(a), or simply 
w{a). We consistently denote by Aa the set of elements in A that have size n and use 
the same group of letters for the cardinality An = \An\- The sequence {A^} is called the 
counting sequence of class A. The generating function of a class {A, wa) is given by 

A(z) = ^z"'^^'^) = ^AnZ". 

aeA n 

There are two special classes: £ and Zi which contain only one element of size and e,, 
respectively. In particular, the generating functions of the classes £ and Zi are 



Efz) 



1 and Tiiiz) 



Zi. 



We adhere in the following to a systematic naming convention: classes, their counting 
sequences, and their generating functions are systematically denoted by the same groups 
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of letters: for instance, C for a class, {Cn} for the counting sequence, and C(z) for its 
generating function. Let A and "B be combinatorial classes of d dimension. Suppose Ai 
are combinatorial classes of 1 dimension. We define 

• (yii, yi2) := {c = (ai, 02) I o-i G Ai} and for c = (ai, 02) G (-^^i, -^2) 

W{AuA2)i(^) = (w^yii(ai), W^yl2(«2))), 

• /I + 3 := ^ U S, if ^ n 3 = and for c G ^ + S, 

Wyi(c) if c G yi 
w-b{c) if cGS, 

• A X := {c = {a,b) \ a e A,b e "B} and for c G /I x 3, 

w^yixs(c) = + w-sib), 

• SEQ(yi) := £ + ^ + X yi) + X yi X ^) + ■ ■ ■ . 

Plainly, Seq(j'I) defines a proper combinatorial class if and only if A contains no element 
of size 0. We immediately observe 

Proposition 1. Suppose A, 23 and G are combinatorial classes of d dimension having the 

generating functions A{z) , B(z) andC{z). LetAi be combinatorial classes ofl dimension 

having the generating functions Ai{z). Then 

(a) e = {AuA2, ...,Ad)^ C(z) = Ai{zi) A2(22) • • • Ad{zd) 

{h) e = A + 'B^ C(z) = A(z) + B(z) 

(c) e = Ax-B^ C(z) = A(z) ■ B(z) 

(d) e = seq(^) =^ c(z) = 

2.3. Secondary structures. Let f{n) denote the number of all noncrossing matchings 
of n arcs having generating function F(z) = ^ f{n) z^. Recursions for f[n) allow us to 
derive 

zF{zy -F{z) + 1 = 0, 

that is we have 

^ ' 2z 

Let To- denote the combinatorial class of a-canonical secondary structures having arc- 
length > 2 and Tcr{n) denote the number of all a-canonical secondary structures with n 
vertices having arc-length > 2 and 

T,{z) = Y,T.{n) z\ 
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Theorem 3. Suppose a G N, a > 1 and u„{z) 




2 + 1 ■ 



Then we have 



Ua{z)z'^ — z + 1 



1 



F 




{u„{z)z^ -z + 1) 




) 



2 



where 



F{z) 



1 - VI -42 
2i 



Since F{z) is algebraic and Mo-(2) is a rational function, Theorem [3] implies that Tfj{z) 
is an algebraic function for any a. 



Given two RNA sequences R and S with n and m vertices, we index the vertices such 
that Ri is the 5' end of R and 5*1 is the 3' end of S. We refer to the zth vertex in R by Ri 
and the subgraph induced by {Ri, . . . ,Rj} by R[i,j]. The intramolecular base pair can 
be represented by an arc (interior), with its two endpoints contained in either R or 5*. 
Similarly, the extramolecular base pair can be represented by an arc (exterior) with one 
of its endpoints contained in R and the other in S. A pre-structure, G{R, S, /), is a graph 
consisting of two secondary structures R and S with a set / of noncrossing exterior arcs. 
When representing arc-configurations, we draw all i?-arcs in the upper-halfplane and all 
S'-arcs in the lower-halfplane, see Fig. [2], (A). 

The subgraph R[i,j] {S[i',j']) is called secondary segment if there is no exterior arc 
RkSk' such that i < k < j {i' < k' < j'), see Fig. [2], (A). An interior arc RiRj is an 
i?-ancestor of the exterior arc RkSk' ii i < k < j. Analogously, Si'Sj' is an S-ancestor 
of RkSk' if i' < k' < j'. We also refer to RkSk' as a descendant of R^Rj and Si'Sj' 
in this situation, see Fig. [21 (A). Furthermore, we call RiRj and Si'Sj/ dependent if 
they have a common descendant and independent, otherwise. Let RiRj and Si'Sj' be two 
dependent interior arcs. Then R^Rj subsumes Si/Sji, or Si'Sj' is subsumed in RiRj, if for 
any RkSk' E I, i' < k' < j' implies i < k < j, that is, the set of descendants of Si'Sj' 
is contained in the set of descendants of RiRj, see Fig. |2l (A). A zigzag is a subgraph 
containing two dependent interior arcs Ri^Rj^ and Si^Sj^ neither one subsuming the other, 
see Fig. El (B). A joint structure J{R, S, I) is a zigzag-free pre-structure, see Fig. [21 (A). 
We denote the combinatorial class of all joint structures by 3- We can define the size- 
function as follows: wg{J{R, S, I)) = {n,m,h), where n and m denote the number of 
vertices in the top and bottom sequence and h denotes the number of exterior arcs in 
the joint structure. We denote by 3{n,m, h) the subset of 3 which contains all the joint 



3. Joint Structures 



Fig. 2. (A): The joint structure J{R,S,I) with arc-length > 3, interior 
stack-length > 2, exterior stack-length > 3. Secondary segments (red): the 
subgraphs R[16, 21] and ^[lO, 15]. Ancestors and descendants: for the exte- 
rior arc R5S5, we have the following sets of i?- ancestors and S*- ancestors of 
R5S5: {R1R15, R2R14, R3R9, RiRs,} {S1S21, S2S20, S^Sg, S^Ss,}- The 
exterior arc R5S5 is a common descendant of R1R15 and S3S9, while i?io5'i7 
is not. Subsumed arcs: R1R15 subsumes S^Sg and S'iS'21. (B): A zigzag, 
generated by R2S1, R3S3 and R^S^. 

structures of the size {n,m,h) and set the counting sequence J{n,m,h) = \3{n,m, h)\. 
The generating function of the class 3 is given by 

J(x,l/,^) = 5^J(r^,m,/^)xV^^^ 

n,m 

We next specify some notation 

• an interior arc (or simply arc) of length A is an arc R[i,j] {S[i',j']) where j — i = X 
(/-^' = A), 

• an interior stack (or simply stack) of length a is a maximal sequence of "parallel" 
interior arcs, 

{RiRj, Ri+iRj-i, . . . , Ri+a-iRj-a+i) or 

jSj, Si^lSj-l, . . . , Si^cr-lSj—cr+l): 

• an exterior stack of length r is a maximal sequence of "parallel" exterior arcs, 

{RiSi', Ri+iSi'+i, . . . , Ri+r-iS'i'+T-i)- 

Let S^ajr denote the class of all joint structures with arc-length > A, interior stack-length 
> cr, exterior stack- length > r. Similarly, we can define its counting sequence jl^l{n, m, h) 
and generating function j[^i(a;, y, z). In case of A = 2, we omit A in the notation. If there 
is no restriction on the interior and exterior stack-length, we also omit further indices. In 
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the particular case a = r, we just write a in the notation and omit r. In Fig. |2l (A), 
we give an example of joint structure with arc-length > 3, interior stack-length > 2 and 
exterior stack-length > 3. 



We denote the subgraph of a joint structure J(i?, 5, /) induced by a pair of subse- 
quences {-Rj, . . . , -Rj} and {Sit, ^j'+i, . . . ,Sji} the block Jij-^i'j'. Given a joint struc- 
ture J{R,S,I), a tight structure of J{R,S,I) is the minimal block Jij^i'j' containing all 
the i?-ancestors and ^-ancestors of any exterior arc in Jij-i'j> and all the descendants of 
any interior arc in Jij-ej'- In the following, a tight structure is denoted by Jf^j-ii^i- In 
particular, we denote the joint structure J{R, S, I) by J^{R, S, I) if J{R, S, I) is a tight 
structure of itself. For any joint structure, there are only four types of tight structures 
Jfj-i'j', that is {o,v,A,n}, denoted by j}°'Ji'p^\ respectively. The four types of tight 
structures are defined as follows: 

o: {RiSi>} = Jlj.i>j> and i= j , i' = f; 

y: RiRj E Jl^.i,^^, and S^Sj, ^ .11^.,,^,; 

A: Si'Sj' G Jij-^i'j' and RiRj ^ Ji.jii'j'] 
□ : {RiRj, Si'Sj/} E J^j.^ij,. 

The key function of tight structures is that they are the building blocks for the decom- 
position of joint structures. 



Proposition 2. /(Huang et al.l . \200M) Let J{R, S, I) be a joint structure. Then 

(1) any exterior arc RkSk' in J{R, S, I) is contained in a unique tight structure. 

(2) J{R, S, I) decomposes into a unique collection of tight structures and maximal 
secondary segments. 



4. Shapes 

Definition 2. (Shape) A shape is a joint structure containing no secondary segments 
in which each interior stack and each exterior stack have length exactly one. 

Let S denote the combinatorial class of shapes. Given a joint structure, we can obtain 
its shape by first removing all secondary segments and second collapsing any stacks into 
a single arc. That is, we have a map (p: d ^ 9, see Fig. [3l Let G(ti,t2,h) denote the 
number of shapes having ti arcs in the top sequence, t2 arcs in the bottom and h exterior 
arcs having the generating function 

G{u, v,z) = J2 G{ti,t2, h) u'^v'^z^. 
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Fig. 3. Joint structures and their shapes: a joint structure (left) is pro- 
jected into its shape (right). 

We next introduce tight shapes, double tight shapes, interaction segments, closed shapes 
and right closed shapes: 

• A tight shape is tight as a structure. Let denote the class of tight shapes 
by and G^{ti,t2, h) denote the number of tight shapes having ti arcs in the top 
sequence, ^2 arcs in the bottom and h exterior arcs having the generating function 

Any tight shape, comes as exactly one of the four types {o, v. A, □}. The corre- 
sponding classes and generating functions are defined accordingly, 3't°>'^>^>'^} and 
G^°''^'^'^^{x,y, z) respectively, 

• A double tight shape is a shape whose leftmost and rightmost blocks are tight 
structures. Let denote the class of double tight shapes by and G^'^{ti,t2, h) 
denote the number of double tight shapes having ti arcs in the top sequence, ^2 
arcs in the bottom and h exterior arcs having the generating function 

G^^(n, v,z) = Y, G''^itut2, h) u'^v'^z\ 

• A closed shape is a tight shape of type {V, A, □}. Let 9*" denote the class of closed 
shapes and G""(ti,t2, h) denote the number of closed shapes having ti arcs in the 
top sequence, t2 arcs in the bottom and h exterior arcs having the generating 
function 

G^(m, v,z) = Y^ G^itiM. h) u'^v'^z^, 

• A right closed shape is a shape whose rightmost block is a closed shape rather than 
an exterior arc. Let S'^'" denote the class of right close shapes and G^^ {ti,t2, h) 
denote the number of right close shapes having ti arcs in the top sequence, t2 arcs 
in the bottom and h exterior arcs having the generating function 
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• In a shape, an interaction segment is an empty structure or an tight structure of 
type o (an exterior arc). We denote the class of interaction segment by J and the 
associated generating function by I(a;, y, z). Obviously, y,z) = 1 + z. 

Theorem 4. The generating function G{u,v,z) of shapes satisfies 

(4.1) A(m, w, z)G{u, V, zY + B(m, f , z)G{u, v, z) + C(u, f , z) = 0, 
where 

A(u, V, z) = {u + V + uv){z + 1), 

(4.2) B{u,v,z) = -{{u + v + uv){z + 2) + 1), 

C{u,v,z) = {l + u){l + v){l + z). 

Proof. Proposition [2] implies that any shape can be decomposed into a unique collection 
of tight shapes. Furthermore, each shape can be decomposed into a unique collection 
of close shapes and exterior arcs. We decompose a shape in four steps, see Fig. HI We 
translate each decomposition step into the construction of combinatorial classes in the 
language of symbolic enumeration. 

Step (1): we decompose a shape into a right closed shape and rightmost interaction 
segment. We generate S = 5^'^ x J + J. 
It follows from Proposition [T] that 

(4.3) G(a;, y, z) = G^^(a;, y, z) ■ l{x, y, z) + I(x, y, z). 

Step (2): we decompose a right closed shape into the rightmost closed shape and the 
rest, deriving 

S^^ = S X g^, 

whence 

(4.4) G^^(x, y, z) = G(x, y, z) ■ G^(x, y, z). 

Step (3): we decompose a closed shape depending on its type. The decomposition 
operation in this step can be viewed as the "removal" of an interior arc. We derive 

g^ = (2.,£,z) + (z,£,e) X g^'^ 
g^ = (£,z,z) + (£,z,£) X g^^ 
g° = (z, z, z) + (z, z, £) X g^^ 
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and obtain the generating functions 

G^(x, z) = G^(x, z) + G^(x, z) + G°(x, z) 

G'^(x, y,z) = X z + X G^^{x, y, z) 
^^■^^ G\x,y,z) = yz + yG^''{x,y,z) 

G^{x,y,z) = xyz + xy G^^(x, y, z). 

Step (4): the class of double tight shapes arising from Step (3) can be obtained by 
excluding the class of interaction segment and the class of closed shapes from the class of 
shapes. Similarly, we have 

The corresponding generating function accordingly satisfies 

(4.6) G^^(x, y, z) = G(x, y, z) - y, z) - G^(x, y, z). 

We proceed by solving the set of equations fl4.3p - fl4.6p . thereby deriving the functional 
equation eq. fl4.2p for G(x, z) and the theorem follows. □ 



5. The generating function 



We proceed by generating joint structures from shapes via inflation. Let Jo-^t- denote 
the class of joint structures with arc-length > 2, interior stack-length > a, exterior stack- 
length > r. Let Jcr,T{n,m, h) denote the number of joint structures in 3a,T having n 
vertices in the top, m vertices in the bottom and h exterior arcs having the generating 
function 

J^,^(x, y,z) = ^ Ja^rin, m, h) x^y'^z^. 
Theorem 5. For a > I^t > 1 , we have 

(5.1) J,,,(x,?/,2) = T,(a;)T,(l/)G(r/(x),r/(y),r/o), 

where 

[xyzYT^ix] T.(y) 
\-xyz-{xyzy{J,{x)ll,{y)~\)' 

Proof. Let 3(^1,^2?^) denote the class of shapes having ti interior arcs in the top, t2 
interior arcs in the bottom and h exterior arcs. For any joint structure, we can obtain a 
unique shape in S as follows: 

(1) Remove all secondary segments. 
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Step(1) 

□ EZID-D 

step (2) 

I I I 




Step (4) 

I I □-D-H 

D aaiziHQQQ I 

ABCDEFGHI 

Fig. 4. The shape-grammar. The notations of structural components 
are explained in the panel below. A: interaction segment; B: arbitrary 
shape G{R, S, I); C: right close shape G^^ {R, S, I); D: double tight shape 
G^^{R, S, I); E: close shape G^{R, S, I); F: type □ tight shape G°{R, S, I); 
G: type V tight shape H: type A tight shape G^{R,S,I); I: 

type o tight shape G°{R, S, /); 

(2) Contract each interior stack into one interior arc and each exterior stack into one 
exterior arc. 

Then we have the surjective map 

Indeed, for any shape 7 in S, we can construct joint structures with arc-length > 2, stack- 
length > a, exterior stack-length > t. if: da,T — ^ 9, induces the partition da,T = 'j7V^~^(7)- 
Then we have 



(5.2) 



Ja,r(a:, y, z) = ^ J^(x, y, z). 

7GS 
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Fig. 5. Step I: a shape (left) is inflated to a joint structure with arc-length 
> 2 and interior stack-length > 2. First, each interior arc in the shape is 
inflated to a stack of size at least two (middle). Then the shape is inflated 
to a new joint structure with arc-length > 2 and interior stack-length > 2 
(right) by adding one stack of size two. Note that there are three ways to 
insert the secondary segments to separate the induced stacks (red). 

We proceed by computing the generating function J^{x, y, z). We will construct J^(a;, y, z) 
via simpler combinatorial classes as building blocks considering Mo- (stems), %a (stacks), 
(induced stacks), (interior arcs) and To- (secondary segments). We inflate a shape 
7 G S(^i, ^2, ^) to a joint structure in three steps. 

Step I: we inflate any interior arc in 7 to a stack of size at least a and subsequently 
add additional stacks. The latter are called induced stacks and have to be separated by 
means of inserting secondary segments, see Fig. [51 Note that during this flrst inflation 
step no secondary segments, other than those necessary for separating the nested stacks 
are inserted. We generate 

• secondary segments To- having stack-length > cr having the generating function 
T.(^), 
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interior arcs 01 with generating function Il{z) = z^, 

stacks, i.e. pairs consisting of the minimal sequence of arcs Jl'^ and an arbitrary 
extension consisting of arcs of arbitrary finite length 

= 01'^ X Seq (Jl) 

having the generating function 

KJz) = z'^ ^ 



1-^2' 

induced stacks, i.e. stacks together with at least one secondary segment on either 
or both of its sides, 

J^^ = X^x {7l - 1) , 
having the generating function 

N.(^) = -r^ {T.izY - 1) , 

stems, that is pairs consisting of stacks %cr and an arbitrarily long sequence of 
induced stacks 

= X Seq (:N^) , 
having the generating function 

KJz) 



M^{z) 



1-2^ 



l-N.(z) i_^(T,(^)2_i)- 

Note that we inflate both: top as well as bottom sequences. The corresponding generating 
function is 

Step II: we inflate any exterior arc in 7 to an exterior stack of size at least r and 
subsequently add additional exterior stacks. The latter are called induced exterior stacks 
and have to be separated by means of inserting secondary segments, see Fig. [61 Note that 
during this exterior-arc inflation step no secondary segments, other than those necessary 
for separating the stacks are inserted. We generate 

• exterior arc Xg having the generating function 

Ro = xyz, 

• exterior stacks, i.e. pairs consisting of the minimal sequence of exterior arcs JIq 
and an arbitrary extension consisting of exterior arcs of arbitrary finite length 

x; = X^ X Seq (Xq) 



Fig. 6. Step II: a joint structure (left) obtained in (1) in Fig. |5]is inflated 
to a joint structure in ^2,2- First, each exterior arc in the joint structure 
is inflated to an exterior stack of size at least two (middle), and then the 
structure is inflated to a new joint structure in ^2,2 (right) by adding one 
exterior stack of size two. There are three ways to insert the secondary 
segments to separate the induced exterior stacks (red). 

having the generating function 



• induced exterior stacks, i.e. stacks together with at least one secondary segment 
on either or both its sides. 



K = X {7l - 1) 



having generating function 





(T.(x)T.(y) - 1) , 



1 — xyz 
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Fig. 7. Step III: a joint structure (left) obtained in (1) in Fig. Elis inflated 
to a new joint structure in ^2,2 (right) by adding secondary segments (red). 

exterior stems, that is pairs consisting of exterior stacks and an arbitrarily long 
sequence of induced exterior stacks 

m; = oc; X seq (X) , 

having the generating function 



M' 



K' 



1—xyz 



1_N; l-Ml(T.(x)T.(y)-l)' 



We inflate all the exterior arcs and the corresponding generating function is 

(M;)^ 

Step III: here we insert additional secondary segments at the remaining (2ti + h + 1) 
positions in the top and the (2^2 + h + 1) positions in the bottom, see Fig. [71 Formally, 
the third inflation is expressed via the combinatorial class 

^cj ■j2ti+/i+l ^2t2+h+l 

where the corresponding generating function is 

Combining Step I, Step II and Step III we arrive at 

M^ixY^ M^ivY' (M:)'^T.(a;)2*i+'^+iT,(i/)2*^+'^+i 

and accordingly 

M^{xY' M^ivY' (M;)''T,(x)2*i+^+^T,(?/)2*^+'^+^ 
= T^{x)T^{y){T,{xYM^{x)YHTM'M^iy)y%M'^T,{x)T^{y))\ 
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Therefore, 

(T.(y)2M.(y))*HM;T.(x)T.(y))^ 
Since for any 7, 71 G 3(^1, ^2, h) we have J^(x, z) = J^j (x, z), we derive 



Set 



76 3 {ti,t2,h) 

75 S(ti,t2.'i) 



ri[w) - 



1 - w;2 - w2-(T^(u7)2 - 1) 
(xyzYT^ix) T^iy) 



1-xyz- {xyzy{T^{x)T^{y) - 1) " 
According to the generating function 

G{u, v,z) = J2 G{ti, t2, h)u''v'^z^, 

we have 

J<7,r(a:, y, z) = T^{x) T^{y) G{r]{x),ri{y),T]o) 
and the theorem follows. □ 

6. Asymptotic analysis 

6.1. The supercritical paradigm. Suppose V{z) = G{z,z,z). We view lJ{z) as a 
generating function, U(z) = XI ^(0 ^''^ where U{1) denotes the number of shapes having 
I arcs. It follows from Theorem H] that U(z) satisfies 

+ 2z) V{z)^ - {z" + 32 + 1) \5{z) + (1 + zf = 0. 

Solving this functional equation, we derive 



, 1 + 3Z + z^ - Vl-2z-9z^ - 10^3 - 3z^ 
(6.1) lJ{z) = 



2z{z + 2) 

It is straightforward to verify that the dominant singularity p of U(2) is the minimal and 
positive real solution of 1 - 22; - 9z'^ - lOz"^ - Sz'^ = and p ^ 0.22144, see Fig El 



For our computations the following instance of the supercritical paradigm ( iFlajolet 



20071 ) is of central importance: we are given a D-finite function, f{z) and an algebraic 
function g{u) satisfying g{0) =0. Furthermore we suppose that f{g{u)) has a unique real 
valued dominant singularity 7 and g is regular in a disc with radius slightly larger than 
7. Then the supercritical paradigm stipulates that the subexponential factors of f{g{u)) 
at M = 0, given that g{u) satisfies certain conditions, coincide with those of f{z). 
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0.10 




Fig. 8. Universality of the square root. We display the dominant singu- 
larity of the generating function U{z) of shapes (here at p ~ 0.22144). All 
singularities arising from composition of the "outer" function lJ{z) governed 
by the supercritical paradigm produce this type of singularity, leading to 
the subexponential factor n~5. 

Lemma 1. Let '^{z) be an algebraic, analytic function for \z\ < r such that -(9(0) = 0. In 
addition suppose 7 is the unique dominant singularity of \J{'d{z)) and minimum positive 
real solution of 'd{z) = p, \z\ < r, 'd\z) 7^ 0. Then \J(J}{z)) has a singular expansion and 

(6.2) [^"]U(^9(^)) ~cn"t (7-^)", 

where c is some constant. 

Proof. Since 'di^z) is an algebraic function such that i?(0) = and \J{z) is algebraic whence 
is Z)-finite, we can conclude that the composition \J{'d{z)) is D-finite. In particular 
U('t9(z)) has a singular expansion. 

Next, we calculate the singular expansion of the composite function \J{'d{z)). In view of 
[z'^]f{z) = 'y"'[z"']f{-) it suffices to analyze the function \J{'d{'yz)) and to subsequently 
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rescale in order to obtain the correct exponential factor. For this purpose we set 

where 79(2;) is analytic in \z\ < r. Consequently d{z) is analytic in \z\ < r, for some 1 < r . 
The singular expansion of V{z), for z — p, is given by 

Viz) = Uo + Ui{p- z)"2{l + o{l)). 

By construction \J{'d{'jz)) = \J{{}{z)), \J{'d{z)) has the unique dominant singularity at 1. 
We have the Taylor expansion oi -dlz) at z = 1 

(6.3) p - d{z) = Y^dn{l- zY = ^1 (1 - z){l + 0(1)). 

n>l 

As for the singular expansion of '[J{d{z)), substituting eq. (16.31) into the singular expansion 
of U(2:), for 2 1, 

\J{§{z)) = uo + uitl{l- z)-^{l + o{l)), 
where di = ^'{z)\z=i = 'y'&'{z)\z='y 7^ 0. By Theorem [1] and Theorem [2] we arrive at 

[2;'^]U('i9(2)) ~ cn~2 for some constant c. 
Finally, we use the scaling property of Taylor expansions in order to derive 

[z^]\jim) = {i~T [^^u(^(2)) 

and the proof is complete. □ 

We remark that Lemma [H allows under certain conditions to obtain the asymptotics 
of the coefficients of supercritical compositions of the "outer" function \J{z) and "inner" 
function -^Iz). The scenario considered here is tailored for asymptotic expressions of Jais). 

6.2. Asymptotics of Jo-(s). In this section we shall assume cr = r. Let Jais) denote the 
number of joint structures of total s vertices having arc-length > 2, stack-length > cr and 
exterior stack-length > a having the generating function 

3„{z) = J2Ms)z'. 

By definition, we have 

3^{z) = 3^^^{z,z, 1). 

Theorem 6. For a > 1, we have 

(6.4) 3^{z) = T^{zYV{az)), 
where 
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Furthermore, for 1 < a < 9, Jo-(s) satisfies 
(6.6) J^(s) ~ c„s~i {%^y , for some c^, 

where 7o- is the minimal, positive real solution of the equation ({z) = p, see TableUl In 
particular, Ci ^ 1.6527921 and C2 « 4.3011932. 

Proof. By Theorem [5] and the definition, we have 

= TAzfG{az),az),az)) 

= T.(z)^U(C(z)), 

where 

C(z) = - '"^ ' 

Since Ta{z) is algebraic, we can conclude that ({z) is algebraic from the closure property 
of algebraic f unctio ns, whence \J{({z)) and 3cr{z) are D-finite. Pringsheim's Theorem 
( Titchmarshl . Il939l ) guarantees that Jaiz) has a dominant real positive singularity 7o-. 
We verify that for 1 < a < 9, the minimal, positive real solution of the equation ({z) = p 
is strictly smaller than the singularity of C{z), which is actually the singularity of Ta{z). 
Hence 7cr is the unique, minimal, positive real solution of the equation ({z) = p and it is 
straightforward to check that ('{z)\z='y^ 7^ 0. Therefore the composite function \J{({z)) is 
governed by the supercritical paradigm of Lemma [H Furthermore Tcr{z) is analytic at 70-, 
whence the subexponential factors of To-(z)^ \J{({z)) coincide with those of the function 
U{z). Consequently, 

Ja{s) ~ c^s"t {la^y , for some c,^. 

The values of 7"^ are listed in Table [H It remains to calculate the constant coefficient in 
the asymptotic formula. Setting the singular expansion of U(z) around p and the Taylor 
expansions of ({z) and To-(^)^ around 7^-, 

V{z) = uo + ui{p- z)-^ +0{{p- z)), 

C{z)-p = (7i(z-7,) + 0((2;-7a)'), 
T^zf = to + t^{-f„-z) + 0{{-f^-zy). 

We proceed by substituting these expansions into To-(z)^ \J{({z)) 

1 1 

Ja{z) = touo + touigl {-f„ - z)2 + 0(7^ - z). 
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Table 1. Exponential growth rates 7^. ^ for joint structures with arc-length 
> 2, having both stack-length and exterior stack-length > a. 



a 1 


2 


3 


4 


5 


6 


7 8 


9 


7-^ 3.48766 


2.24338 


1.86724 


1.67974 


1.56544 


1.48763 


1.43083 1.38731 


1.35276 



Using Theorem [T] and Theorem [21 we have 

i,w~^5^;ig^H (7,)- 

Setting = *°"^i^7r^^ , we compute Ci ^ 1.6527921 and C2 ^ 4.3011932, completing the 
proof of Theorem [6l □ 

We next observe that eq. ( 16. 4p allows us to derive a functional equation for Jct(-2), which 
in turn gives a recurrence of Jo-(s). 

Corollary 1. For a > 1, the generating function J^iz) satisfies the functional equation 
(6.7) A{z)3^{zy + B(2) J^(^) + C(^) = 0, 

where 

A{z) =z2- (2 - 2^ + 2^2- _ z2-T,(;,)2)^ 
B{z) =-[l-2z' + z^ + i2 + T^{zf)z^'' 

- (2 + T.(^)2)^2H-2. ^ (1 ^ T.^^)2 _ T,(^)4)/<. j ^ 

C(z) =(l-z^ + z^n^. 



(6.8) 



Furthermore, the number Ja{s) of joint structures with total s vertices satisfies the fol- 
lowing recurrence: 

s s s—i 

J^{s) = c(s) + ^ b{i) J„{s - + XI X] '^'^(•^^ ■^''^^ - ^-3)^ 

i=l i=l j=0 

where a{s), b{s) and c(s) are the coefficients of z'^ of A{z), B{z) and C{z), respectively. 

In Table [21 we list the numbers of joint structures Ji(s) and ^2(5) for s = 1, . . . , 12. 

Proof. Substituting z = ;^_^2^^2a'(T^(^)2-i) ^^^'^ ^'i- ^\^■^ ^'^^ using eq. (16. 4p . we obtain 
eq. (16. 7p . Note that a(0) = and 6(0) = —1. Calculating the coefficients of z'"^ of 
eq. (16. 7p . the recurrence follows immediately. □ 
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Table 2. The numbers of joint structures Ji{s) and J2{s) over a total number 
of s = 1, . . . , 12 nucleotides. 



s 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


Us) 


2 


4 


10 


26 


70 


194 


550 


1590 


4674 


13940 


42106 


128610 


Ms) 


2 


3 


4 


6 


12 


26 


54 


105 


200 


389 


780 


1589 




10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 



Fig. 9. Exact enumeration versus asymptotic formula. We plot the num- 
ber of of joint structures with arc-length > 2 and stack-length > 1, 

3 

(Ji(s)) versus its asymptotic formula cs~2 3.48766* (left) and J2{s) ver- 
sus cs~^ 2.24338* (right). For representational purposes we separate the 
curves via setting the respective constants c = 10"^. 

In Fig. ini we show that our asymptotic formulas work well already for small sequence 
length. Here we contrast the exact values, Ji{s) and J2(s), with the asymptotic formulas 
given via Theorem [61 

Ji(s) ~ ci 3.48766* and J2{s) ~ cs s'i 2.24338*. 



7. Discussion 

The discovery of more and more instances of regulatory actions among RNA molecules 
make evident that RNA-RNA interaction is a problem of central importance. While it is 
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wellk nown how to MFE-fold these interaction structures (lAlkan et al. 



2006 



Huang et al. 



20091 ) this paper constitutes progress with respect to the theoretical understanding of 



RNA-RNA interaction structures. Insights in the combinatorics of joint structures allows 
deeper understanding of analysis and design of folding algorithms as well as algorithmic 
approximations. 

At first sight, it should be straightforward to derive the generating function of joint 
structures from the (eleven) recurs ion relations of the original rip-grammar (implied by 
Proposition [2]) (IHuang et a/.l . l2009l ). While this is in principle correct, the mere statement 
of the generating function derived this way fills several pages. This approach is neither 
suitable for deriving any asymptotic formulas nor does it allow to deal with specific stack- 
length conditions. In fact, the extraction of its coefficients would present a nontrivial 
task. 



We do not use the recurrences of ( iHuang et ali l2009l ) directly. Instead we build our 
theory of joint structures centered around the concept of shapes. The key to all results 
is the simple shape-grammar of Theorem |H The basic idea here is that the collapsing 
of stems preserves vital information of the interaction structure. Given a shape a joint 
structure can be obtained via inflation, see Theorem [5l 



W hile there exists a notion of shapes for RNA secondary structures (iGiegerich et al. 



20021 ) their combinatorics is not shape-based. Everything is organized around recurrences. 



which oftentimes hides deeper structural insight and connections. As a result symbolic 
enumeration has not been employed in order to derive the generating function of RNA 
secondary structures. 

In contrast, RNA pseudoknot structures (IReidys et all 120101 ) represent a shape-based 
structure class (here further complication enters the picture as the generating function of 
their shapes can only be computed via the reflection principle). 

The theory of joint structures presented here resembles features of the theory of modular 
diagrams and is in particular shape-based. However, the shapes of joint structures are 
governed by simple algebraic generating functions and satisfy a simple recurrence. 

Let us flnally outline future research. We currently study the generating function of 
canonical joint structures having minimum arc-length four. This derivation requires a 
more detailed look at shapes of joint structures since additional variables have to be 
introduced. The purpose of these variables is to allow to distinguish speciflc inflation 
scenarious. 
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8. Further results 

In this section we generalize our results to joint structures with arc-length > A, interior 
stack-length > a, and exterior stack-length > r. Let T^'^^ denote the combinatorial class 
of (T-canonical secondary structures having arc-length > A and tI^^ (n) denote the number 
of all (T-canonical secondary structures with n vertices having arc-length > A and 



Theorem 7. Let a E N , z be an indeterminant and let 

(z^Y-^ 



z^'^-z^ + r 

A 

vx{z) = 1- z + u^{z)^z'', 

h=2 

then, Tct^(z), the generating function of a -canonical structures with minimum arc-length 
A is given by 



2N 




where 



F(.) = wm. 

^ ' 2z 

Theorem [7] implies that T^a\z) is an algebraic function for any specified A and a, since 
F{z) is algebraic and vx{z),U(j{z) are both rational functions. 

We are now in position to establish a generalization of Theorem \5\ that allows us to 
compute the generating function 31^}t{x, y, z) for A < r + 1. 

Let Sajr denote the class of joint structures with arc-length > A, interior stack-length > a, 
and exterior stack- length > r. Let ji'^|(n,m, h) denote the number of joint structures in 
S^ajr having n vertices in the top, m vertices in the bottom, h exterior arcs having the 
generating function 

JW (x, y,z) = Y, Jl'lin, m, h) x^y^z\ 
Theorem 8. For cr>l,r>l, A<r + 1 , we have 
(8.1) Jl'Ua;, y, z) = T^^\x) T^^\y) G{r^{x), r]{y),vo) , 
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where 

TjiWj = — 

Vo - 



1-xyz- {xyzy{Tl^\x)Tl^\y) - 1) 
Proof. Using the notation and approach of Theorem [5] we arrives at 
3<:, = X Seq (Jl) 
X. = X. X ((tW)2 _ 1) 
M^ = %^x Seq (X^) 
x; = 3^5 X Seq {%) 
K = X ((tW)2 _ 1) 
= x; X Seq (X) 

The only difference is that T^^^ replaces Tg- to make the structure with arc- length > A. 
The key point here is that the restriction A < r + 1 guarantees that any 2-arc in 7 has 
after inflation a minimum arc-length of r -|- 1 > A. 
Therefore, the generating function of class S^ajr satisfies 



where 



ri{w) 



Vo 



~w^-{Tl^\wy -1) 
ixyzyTl''\x) Tl^\y) 
1-xyz- {xyzy{Tl^\x)Tl^\y) - 1) 



□ 



We remark that Theorem [8] immediately implies Theorem [5l 
Analogously, we have 

Theorem 9. For A < o" + 1, we have 

(8.2) JW(z) = tW(z)2U(C(^)), 
where 

(8.3) C(^)- '^^'^ 
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Table 3. Exponential growth rates {l^^^ for joint structures with arc- 
length > A, having both stack-length and exterior stack-length > a. 



a 



A = 1 3.77438 

A = 2 3.48766 

A = 3 0.00000 

A = 4 0.00000 

A = 5 0.00000 



2.30663 1.89559 

2.24338 1.86724 

2.21090 1.84998 

0.00000 1.83971 

0.00000 0.00000 



1.69615 1.57629 

1.67974 1.56544 

1.66876 1.55773 

1.66155 1.55233 

1.65691 1.54861 



1.49541 1.43671 

1.48763 1.43083 

1.48187 1.42633 

1.47764 1.42291 

1.47459 1.42036 



1.39194 1.35651 

1.38731 1.35276 

1.38368 1.34976 

1.38085 1.34737 

1.37867 1.34549 



Furthermore, for 1 < cr < 9 and 1 < A < 5, Jo- (s) satisfies 
(8.4) ^ |^_Lj ^ for some cl'\ 

where 7^^' is the minimal, positive real solution of the equation ({z) = p, see Tablel^ In 
particular, cf^ ^ 1.6527921, ^ 4.3011932, and cf^ ^ 3.8671841. 

Acknowledgments. We would like to thank F.W.D. Huang, E.Y. Jin and R.R. Wang 
for discussions. 
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